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ABSTRACT: The introns of chloroplast trnK u:u belong to Group II introns and contain an open 
reading frame denoted as matK. The trnK 5 -matK-trnK 3 structure is consistent in almost all examined 
higher land plants and in Characeae, but not in other green algae examined. The putative gene product 
MatK is the only maturase in chloroplasts. Functional chloroplast matK genes are retained even in the 
nonphotosynthetic parasite, Epifagus virginiana and the fern, Adiantum capillus-veneris, in which 
chloroplast genome rearrangement has left matK free-standing, apart from trnK exons. Among lower 
land plants, the chloroplasts of Psilotum, mosses and liverworts all have trnK 5 -matK-trnK 3 structure, 
but matK is a pseudogene in hornwort Anthoceros formosae. In this study we found a clear 
trnK 5 -matK-trnK 3 structure in Ophioglossum petiolatum, Lycopodiella cernua and Selaginella 
doederleinii, but PCR with degenerate primers failed to amplify any trnK or matK fragments from 
other ferns and fern allies. However, dot blot hybridization showed distinct signals in these plants that 
failed to amplify matK fragments by PCR, indicating that the matK sequences in those taxa may be too 
divergent to amplify by an ordinary PCR approach. RT-PCR results showed matK genes are expressed 
in Ophioglossum petiolatum and Lycopodiella cernua, but no signal was detected in Selaginella 
doederleinii. Overall, the expression patterns of matK are not consistent in lower land plants. 
Phylogenetic analysis of matK sequence showed that Pinus, Ginkgo, and Cycas fonn a monophyletic 
group, which is sister to angiosperms. Together, they form a clade that is sister to Gnetales. This ad 
hoc reconstruction is likely due to the high evolutionary rate in matK. 

KEY WORDS: Chloroplast matK, Lycopodiella cernua, Selaginella doederleinii, Ophioglossum 
petiolatum, Evolution. 


INTRODUCTION 

Over forty plastid genomes, including more than twenty land plants, have been completely 
sequenced and are available in GenBank, thus providing fruitful information on gene 
structures of plastid genomes. Although introns are not common in organelle genomes of land 
plants, at least 18 plastid genes have been found to harbor introns (Odintsova and Yurina, 
2003). All but one of the chloroplast introns belong to Group II or III subclass introns with 
specific RNA secondary structure (Michel et al., 1989; Sugiura, 1992). The only exception is 
the intron of tRNA Leu gene, a group I intron that seems to have an ancient origin dated back to 
cyanobacteria (Kuhsel et al., 1990; Besendahl et al., 2002). Chloroplast matK, which is 
encoded by the trnK intron, is commonly present in land plants and is the only maturase-like 
gene in plant plastids. The gene was first characterized in tobacco (Sugita et al., 1985), and 
further designated as matK in mustard (Sinapis alba) based on its similar structure and 
composition to mitochondrial Group II intron-encoded maturases in yeast (Neuhaus and Link, 
1987). 
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Group II introns are known for their self-splicing ability under certain conditions (Henke 
et al., 1995), however, the splicing is usually facilitated by other nuclear-encoded proteins 
(Schmelzer et al., 1983) or by their own intron-encoded ORF (Guo et al., 1997), which shows 
a reverse transcriptase activity. The putative protein encoded by this ORF contains a highly 
conserved maturase domain (X-domain) near the C-terminal region. This domain is about 500 
amino acids in length, and exists in all the Group II intron genes with ORF responsible for the 
maturase activity of chloroplasts and mitochondria, or in a freestanding form (Mohr et al., 
1993). This X-domain has a strongly conserved sequence SX 3 - 6 TLAXKXK, and most of the 
sequences have a large excess of basic over acidic amino acids (Mohr et al., 1993) and are 
mostly hydrophilic (Sugita et al., 1985). 

The putative maturase function of chloroplast MatK protein is mostly based on sequence 
comparisons and sparse data for the presence of a spliced fonn RNA in rice (Chiba et al., 
1996), the detection of proteins in Solarium (du Jardin et al., 1994), and RNA-binding activity 
in Sinapis (Liere and Link, 1995). Nonetheless, matK gene is indispensable since it is intact 
and free-standing in the highly reduced plastid genome of the parasitic Epifagus virginiana 
(Wolfe et al., 1992; Mohr et al., 1993; Ems et al., 1995). Interestingly, six Group II introns 
are left in the remaining 21 likely functional genes in Epifagus, and these might be the 
substrates for matK gene (Ems et al., 1995; Wolfe et al., 1992). 

These X-domain containing genes in eukaryotes, however, vary in gene structure. There is 
a reverse transcriptase (RT)-like domain present at N-terminal end in most of the X-domain 
genes of mitochondria (e.g. coxl in Marchantia), and sometimes a Zn 2+ -finger-like (Zn) 
region at the C-tenninus end (e.g. cox laI2 in yeast) (Mohr et al., 1993). Both the Group II 
intron-encoded ORFs in cyanobacterium Calothrix (Mohr et al., 1993) and Lactococcus lactis 
(Matsuura et al., 1997) have the RT-X-Zn motifs. It is reasonable to suggest that this might be 
the most complex and complete structure of the X-domain genes. Mohr et al. (1993) 
demonstrated that the region upstream of the X-domain in MatK protein has some similarity 
to conserved blocks V, VI, and VII of the RT-domains of other Group II intron ORFs. The 
sequenced plastid matK genes from GenBank all lack the necessary RT domain I-IV and Zn 
domain found in other X-domain containing genes, indicating that plastid matK may have lost 
the RT function. 

To date, the chloroplast trnK Lys (UUU) intron-encoded matK has only been found in 
Charophytes (except for Mesostigma and Chlorokybus) and land plants (Ems et al., 1995; 
Wakasugi et al., 1997; Tunnel et al., 2002; Sanders et al., 2003). The intron and matK are 
absent in all other green algae, Euglena, and the plastid precursor cyanobacteria (Kotani and 
Tabata, 1998; Sanders et al., 2003). Mesostigma and Chlorokybus, two algae that are usually 
placed in Charophytes, do not have intron in their chloroplast trnK genes. However, they have 
dubious phylogenetics positions among green algae based on recent analyses, therefore may 
not belong to the Charophyte clade (Bhattacharya et al., 1998; Tunnel et al., 2002). These 
results suggest that chloroplast trnK intron +matK might have been recruited by the ancestors 
of Charophytes s. 1. and the land plants, from a mobile Group II intron, though the actual 
source is unknown. 

Interestingly, the chloroplast matK has been found to be a pseudogene in hornwort 
Anthoceros formosae (Kugita et al., 2003), and in some bryophytes like Porella and 
Plagiomnium (Jankowiak et al., 2004). Other bryophytes such as Marchantia polymorpha 
(Shimada and Sugiura, 1991), Physcomitrella patens (Sugiura et al., 2003), and Sphagnum 
(Jankowiak et al, 2004), all harbor an intact matK ORF. In comparison, matK is free standing 
without franking trnK exons in Adiantum capillus-veneris (Wolf et al., 2003). The universal 
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presence and/or functional studies of MatK among other ferns and fem allies are largely 
unexplored. 

Despite the uncertain function of matK, the conserved DNA sequences provide a useful 
marker for phylogenetic analyses at familial to generic levels as demonstrated in many studies 
(Steele and Vilgalys, 1994; Hilu and Liang, 1997; Hu et al., 2000; Hilu et al., 2003). In this 
study, matK -like sequences were identified and characterized from selected land plants, 
focusing on lycophytes, eusporangiophytes, and true ferns. 


MATERIALS AND METHODS 
Plant materials and DNA isolation 

Plant materials were all collected from Taiwan; for voucher infonnation see Table 1. 
Genomic DNAs were extracted based on a modified CTAB method of Porebski et al. (1997), 
to reduce the effects of polysaccharide and polyphenols. 


Table 1. Voucher information of plant materials used in PCR-screening and dot blot analysis. The right end of 
the table shows the PCR results. Check marks indicate the presence of PCR products of the appropriate size. 


Family 

Species 

Voucher information 

18S 

rbcL 

matK 

Bryophytes 






Anthoceroceae 

Anthoceros fonnosae Steph. 

CSL009, Taipei 

V 

V 


Marchantiaceae 

Marchantia polymorpha L. 

CSL021, Taipei (cult.) 

V 

V 

V 

Lycophytina 






Lycopodiaceae 

Lycopodiella ceriiua (L.) Pic.Serm. 

CSL013, Taipei Co. 

V 

V 

V 


Lycopodium pseudoclavatum Ching 

CSL028, Taichung Co. 


V 


Selaginellaceae 

Selaginella doederleinii Hieron. 

CSL012, Taipei Co. 

V 

V 

V 


Selaginella delicatula (Desv. ex Poir) 

CSL011, Taipei Co. 


V 



Alston 






Selaginella tamariscina (P. Beauv.) 

CSL002, Nantou Co. 

V 

V 



Spring 






Selaginella involuens (Sw.) Spring 

CSL030, Taipei Co. 

V 

V 



Selaginella stauntoniana Spring 

CSL029, Hualien Co. 




Isoetaceae 

Isoetes taiwanensis DeVol 

CSL003, Taipei (cult.) 

V 

V 


Euphyllophytina 






Equisetaceae 

Equisetum ramosissimum Desf. subsp. 

CSL001, Pingtung Co. 

V 

V 



debile (Roxb. ex Vaucher) Hauke 





Ophioglossaceae 

Ophioglossum petiolatum Hook. 

CSL005, Taipei (cult.) 

V 

V 

V 

Marattiaceae 

Angiopteris palmiformis (Cav.) C. Chr. 

CSL004, Taipei (cult.) 

V 

V 


Osmundaceae 

Osmunda banksiifolia (Presl) Kuhn 

CSL024, Taipei Co. 

V 

V 


Schizaeaceae 

Lygodium japonicum (Thunb.) Sw. 

CSL028, Taipei 

V 

V 


Gleicheniaceae 

Dicranopteris linearis (Burm. f.) Underw. 

CSL032, Taipei 

V 



Petridaceae 

Adiantum capillus-veneris L. 

CSL035, Taipei 

V 



Lindsaeaceae 

Sphenomeris biflora (Kaulf.) Tagawa 

CSL039, Taipei (cult.) 

V 




Genomic PCR amplification and sequence analysis 

Primers used to amplify chloroplast trnK/matK region and rbcL genes are listed in Table 2. 
These include some species-specific primers used for RT-PCR reaction. PCR was performed 
with a T-Gradient thennocycler (Biometra, Goettingen, UK). PCR reactions contained 0.5 pL 
Advantage™2 Taq polymerase (BD Biosciences Clontech, Palo Alto, CA, USA), 5 pL buffer, 
4 pL dNTPs (2.5 mM each), 2 pL each primer (10 mM), 20-200 ng genomic DNA, and 
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distilled water to 50 jlxL. The PCR program started with 5 min of 94°C incubation, followed 
by 35 cycles of 30 sec at 94°C denaturing, 90 sec at 54-60°C annealing, and 120 sec at 72°C 
extension. The reaction was finished with 5 min of 72°C incubation and stopped at 4°C. 
Amplified products were purified by QIAquick PCR purification kit (QIAGEN, GmbH, 
Germany) and cloned into pGEM-T Easy vector (Promega, Madison, WI, USA). Nucleotide 
sequences were determined by automated sequencer ABI PRISM 337 (Applied Biosystems), 
and assembled by Sequencher 4.0 (Gene Codes Corp., Ann Arbor, MI, USA). 


Table 2. Primers used for genomic sequence amplification and RT-PCR. 


Primer name 

Sequence (5' to 3') 

Taxon 

Direction 

trnKlL 

CT C A AT GGTAGAGTACT CG 

Universal primer 

Forward 

trnK2R 

AACTAGTCGGATGGAGTAG 

Universal primer 

Reverse 

ophio_matK317F 

C AG AGATT CTTATT CGACTT CTCG 

Ophioglossum petiolatum 

Forward 

ophio matK872R 

GCTTTAATTCCAACCATTTCAGATA 

Ophioglossum petiolatum 

Reverse 

Lyco_matK348F 

CGTA AT CCTAGTT CT GCA A ATT GTT 

Lycopodium cernua 

Forward 

Lyco_matK935R 

ATTAAAAATTTAGTCCCTCCCACAG 

Lycopodium cernua 

Reverse 

Sela d matK456F 

ACCCC A AT CTCTT CAT CC AG 

Selaginella doederleinii 

Forward 

Sela d matK1049R 

TCCATCTTGGCTTGAACCTT 

Selaginella doederleinii 

Reverse 

rbcL35F 

GATT C AAGGCT GGCGTTA AAGAT 

Universal 

Forward 

rbcL700R 

GCGA ATT CTGCCCTTTT CAT CAT 

Universal 

Reverse 

rbcL50R 

A ACACC AGCTTTRA AT CC AA 

Universal 

Reverse 

atpBR 

AC AT CKARTACKGGACC AATA A 

Universal 

Forward 

trnLc 

CG AA AT CGGTAGACGCTACG 

Universal 

Forward 

trnLd 

GGGGATAGAGGGACTT GAAC 

Universal 

Reverse 


Genomic dot blot hybridization 

For each sample, 0.1 micrograms of genomic DNA (1 pg/pL) was denatured by adding 
half volume of 2 M NaCl and half volume of 1 M NaOH for 10 min at room temperature. The 
DNAs were then dotted on a pre-soaked nylon membrane (NEN™, Boston, USA). After 
washing with 50 ml 1 N NaCl and 50 ml 1 M Tris-HCl for 5 mins, the membrane was 
crosslinked with 120,000 pjoule using Spectrolinker SL-1000 (Spectronics, Westbury, New 
York, USA). Hybridization procedure basically followed Sambrook et al. (2001). 
Hybridization was carried out in a buffer containing 5X SSC, 0.1% n-Lauroylsarcosine, 0.1% 
SDS, and 1% blocking reagent (Roche, Indianapolis, IN, USA) at 40-45°C. The PCR 
fragments amplified by species-specific primers from five obtained clones were used as 
probes labeled by DIG-lldUTP (Roche, Indianapolis, IN, USA). Four of the probes were 
from matK homologues: Lycopodiella matK, Ophioglossum matK, Selaginella matK, and 
Adiantum matK. One rbcL probe ( Ophioglossum rbcL ) was used as a positive control. Other 
hybridization procedures were perfonned as suggested by manufacture (Roche, Indianapolis, 
IN, USA). 

RNA isolation and reverse-transcription PCR assay 

Total RNA from young vegetative tissues were isolated using the Pine Tree Method 
(Chang et al., 1993). RNAs were treated with RNase-free DNase (2 units/pg RNA) (Promega, 
Madison, WI, USA) at 37°C for 30 min. First strand cDNA was synthesized by Superscript™ 
II RNaseH' reverse transcriptase system (Invitrogen, life technologies, Carlsbad, CA, USA). 
The reverse primers used to synthesize first strand DNA are listed in Table 2, and the forward 
primers were used in secondary PCR reaction to amplify specific products. PCR conditions 
were the same as previously described. 
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Phylogenetic analysis of matK 

Forty nucleotide sequences of matK coding regions, including three from this study, were 
compiled into a data matrix aligned by ClustalX (Thompson et al., 1997). The plant 
infonnation and their accession numbers are listed in Table 3. Neighbor-joining (NJ) and 
maximum parsimony (MP) analyses were performed with PAUP* 4.0b 10 (Swofford, 2002). 
For all analyses, gaps were treated as missing data, and no sites containing insertion/deletions 
were excluded. Neighbor-joining (NJ) analysis was conducted employing an HKY85 model 
(Hasegawa et al., 1985) to estimate the distances between sequences. Parsimony search 
options invoked 100 random addition sequences, tree bisection-reconnection branch- 


Table 3. Sequence information used in this study. Accession numbers refer to the NCBI GenBank database. 
Asterisk marks indicate new sequences obtained in this study. 



Family 

Genus and species 

Accession 

Green algae 

Chaetosphaeridiaceae 

Chaetosphaeridium globosum 

NC 004115 


Characeae 

Cham connivens 

AY 170442 


Characeae 

Tolypella prolifera 

AY 170451 


Characeae 

Lychnothamnus barbatus 

AY 170448 


Characeae 

Nitellopsis obtusa 

AY 170447 

Bryophytes 

Funariaceae 

Physcomitrella patens (Hedw.) Bruch & Schimp. NC 005087 


Marchantiaceae 

Marchantia polymorpha L. 

NC 001319 


Mniaceae 

Plagiomnium insigne (Mitt.) T. J. Kop. 

AY522574 


Porellaceae 

Porella platyphylla (L.) Pfeiff. 

AY168655 


Sphagnaceae 

Sphagnum inudatum Russow 

AY342156 

Lycopodiophytes 

Lycopodiaceae 

Lycopodiella cernua (L.) Pic.Serm.* 

AY826399 


Selaginellaceae 

Selaginella doederleinii Hieron.* 

AY826400 

Ferns 

Adiantaceae 

Adiantum capillus-veneris L. 

NC 004766 


Ophioglossaceae 

Ophioglossum petiolatum Hook.* 

AY826401 


Psilotaceae 

Psilotum nudum (L.) P. Beauv. 

NC 003386 

Cycads 

Cycadaceae 

Cycas taitungensis C. F. Shen et al. 

AF279795 


Cycadaceae 

Zamia floridana A. DC. 

AF279804 

Ginkgo 

Ginkgoaceae 

Ginkgo biloba L. 

AF543736 

Gnetophytes 

Ephedraceae 

Ephedra sinica Stapf 

AF279805 


Gnetaceae 

Gnetum africanum Welw. 

AY449631 


Welwitschiaceae 

Welwitschia mirabilis Hook. f. 

AF280996 

Conifers 

Araucariaceae 

Agathis borneensis Warb. 

AB023975 


Pinaceae 

Pinus thunbergii Pari. 

D17510 


Taxaceae 

Amentotaxus argotaenia (Hance) Pilg. 

AF152219 


Taxodiaceae 

Taxodium distichum (L.) Rich. 

AF152212 

Basal angiosperms 

Amborellaceae 

Amborella trichopoda Baill. 

AF543721 


Annonaceae 

Anaxagorea acuminata (Dunal) A. DC. 

AY220436 


Cabombaceae 

Brasenia schreberi J. F. Gmelin 

AF092973 


Calycanthaceae 

Calycanthus fertilis var.ferax (Michx.) Rehder AJ428413 


Magnoliaceae 

Magnolia henryi Dunn 

AF209199 


Nymphaeaceae 

Nymphaea odorata Aiton 

AF092988 


Piperaceae 

Piper crocatum Ruiz & Pav. 

AF543745 

Monocots 

Alismataceae 

Alisma canaliculatum A. Braun & Bouche 

AB040179 


Arecaceae 

Nypa fruticans Wurmb 

AF543743 


Asparagaceae 

Asparagus cochinchinensis (Lour.) Merr. 

AB029804 

Eudicots 

Berberidaceae 

Mahonia japonica (Thunb. ex Murr.) DC. 

AB038184 


Cactaceae 

Lepismium cruciforme (Veil.) Miq. 

AY015344 


Ranunculaceae 

Hepatica nobilis war. japonica Nakai 

AB110532 


Saxifragaceae 

Saxifraga integrifolia Hook. 

L20131 


Trochodendraceae 

Trochodendron aralioides Siebold & Zucc. 

AF543751 
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swapping, and retention of multiple parsimonious trees. The internal support was evaluated by 
bootstrap analyses (Felsenstein, 1985) and decay indices (Bremer, 1988, 1994). In the 
parsimony analysis, each of 1,000 bootstrap replicates was analyzed with the heuristic search 
option invoking one random addition replicate each, and not invoking the retention of 
multiple parsimonious trees. Decay indices (Bremer support) were calculated by 
incorporating AutoDecay (Eriksson, 1998) and PAUP* 4.0b 10 (Swofford, 2002), which 
quantify the extra length needed to collapse a branch in the consensus of 
near-most-parsimonious trees (Bremer, 1988, 1994). Five green algal sequences were used as 
outgroups in the analyses. 


RESULTS 

PCR amplification of nuclear 18S, chloroplast rbcL and matK 

All but five taxa ( Lycopodium pseudoclavcitum, Selaginella stauntoniana, and Selaginella 
delicantula, and the two bryophytes) examined yielded PCR products of nuclear 18S 
fragments. We have sequenced six of them to confirm the identity of these products, including 
Lycopodiella cernua, Selaginella doederleinii, Ophioglossum petiolatum, Isoetes taiwanensis, 
Equisetum ramosissimum ssp. debile, and Selaginella tamariscina. Partial rbcL PCR products 
were successfully amplified from all examined taxa except for S. stauntoniana, and the two 
bryophytes. Two of them (.S', doederleinii and O. petiolatum ) were sequenced to confirm their 
identity. Only five of the eighteen taxa examined yielded PCR products amplified by the 
trnKlL/trnK2R pair. After cloning and sequencing, three of them show high similarities to 
chloroplast matK sequences in GenBank. Sequences of trnK(5')-matK-trnK( 3') were then 
identified from L. cernua, S. doederleinii, and O. petiolatum (information see Table 4). A 
summary of the PCR result is shown in the right end of Table 1. Amino acid alignment of the 
X domain of matK from L. cernua, S. doederleinii, O. petiolatum, and selected taxa is shown 
in Fig. 1. 


Table 4. Sequence information of chloroplast trnK/matK obtained in this study. 


Taxa 

trnK 5' intron 

matK 

trnK 3' intron 

Total 

Lycopodiella cernua 

727 bp 

1554 bp 

197 bp 

2478 bp 

Selaginella doederleinii 

697 bp 

1521 bp 

189 bp 

2407 bp 

Ophioglossum petiolatum 

853 bp 

1344 bp 

125 bp 

2322 bp 


Genomic dot blot hybridization 

The result of dot blot hybridization is shown in Fig. 2. Similar results are found among 
different matK and rbcL probes. All samples have hybridization signals except lor Anlhoceros 
formosae and Dicranopteris linearis, where signals are very weak or invisible using all four 
probes. All other samples show distinct matK hybridization signals. 

Reverse-transcription PCR assay 

For the three taxa with new trnK/matK sequences, RT-PCR was perfonned by 
taxon-specific primers; results are shown in Fig. 3. In Lycopodiella cernua and Ophioglossum 
petiolatum, both rbcL and matK yielded PCR products from RT-PCR (Fig. 3A). Since no 
intron fragment was amplified on RT-PCR, this suggests that these two genes are indeed 
expressed in both species. In comparison, only rbcL, but not matK, is expressed in Selaginella 
doederleinii (Fig. 3B). We used two different intron regions as control and repeated the 
reactions; all of them show the same pattern. 
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Lychnothamnus 

Nitellopsis 

Char a 

Tolypella 

Chaetosphaeridium 

Sphagnum 

Porella 

Marchantia 

Plagiomnium 

Physcomitrella 

Adi ant urn 

Ophioglossum 

Psilotum 

Selaginella 

Lycopodiella 

Gnetum 

Nelwitschia 

Ephedra 

Amentotaxus 

Taxodi urn 

Agathis 

Pinus 

Zamia 

Cycas 

Ginkgo 

Nymphaea 

Brasenia 

Amborella 

Anaxagorea 

Magnolia 

Calycanthus 

Piper 

Hepatica 

Mahonia 

Trochodendron 

Saxifraga 

Lepismium 

Nypa 

Asparagus 
A1isma 


RISTKMPVSNLIHSLSLVDLCNIQGYPIHKATWSVLNDEKIINIFYKLWKNILLYYSGCSNRRDLGKIQYILECSCMKTLAFKHKSSITSTWKKYNKYLSF 

RISTKMPVFNLIHSLSLVHLCNIEGYPIHKAKWSVLNDEKIINIFSQLWKNILLYYSGCSNRRDLGKIQYILEFSCMKTLAFKHKSSIRSTWKQYNKYLSF 

RISTKMPVFNLIHRLSVMHLCNLEGYPIHKAAWSVFNDKQIMNIFSNLLRNILLYYSGCSNRSDLGKIQYILEFSCMKTLAFKHKSSIRSTWTQYKKHVSF 

KISTEIPVHSLISNLTTIKLCNRKGYPIHKASWSTFSDKDIINIYHKFWNELSLYYCGSSNRFDLSQIQYIFEFSCIKTLAFKHKSNIRLTWEQYKEYVSF 

KICINVPIKLLIIFLSKNGFCDISGNSKSKLSWSVLQDIEIIEKFRRLWLTISGYYSGSSNKYCLKIVLYILRYSCAKTLACKHKMSLKKIWKKYTLNLSV 

ELCSITPILSLIGLLAREGFCDALGHPISKLAWSTLTDEAIFNRFDQIWRNLFCYYSGCQNRKNLYQVQYILRFSCAKTLACKHKNTIRSVWKKYDLKFLT 

EFCGIIPIVPLIILLARERFCDTSGRPICKLSWTTLADNEIFKQFDQITKNIFRYYSGCIKKKGLYQLQYILRFSCAKTLACKHKSTIRTVWKRYGSNFVT 

EFCSIIPVIPLIRLLAKEKFCDVLGRPLCKLSWTTLSDNEIFERFDQIIKHIFSYYSGCINKKGLYQLQYIFRFSCAKTLACKHKSTIRTVWKKYGSNLLT 

ELYSITPISSLIELLAKEKFCDILGHPISKLAWSTLTDDEIFNRFDQIWKNFFYYYSGCKSKKNLYQVQYILRFSCAKTLACKHKSTIRYVWKKHGSNFFA 

EIYSITPISSLIELLAKENFCDTLGHPISKLAWTTLTDDEIFNRFDQIWRNFFYYYSGCKNKKNLYQVQYILRFSCAKTLACKHKSTIRYVWKKYGSNFFA 

KFYPKIPNSIIITTLAKQRFCDFTGRPIGKSAWVTSTDDKIIDGYVQLWQVFSLYYGASMNQYRLRRLIFLLQMSCDSTLAGKHRSTIRLLRCKSNVEALN 

VLCPKVPTSLSIRSLAREGFCNGLGFPISRSAWATSTDTDTTNRFNRLWKNLFIYYSGSSGLGGLYRIRYILRFSCAKTLACKHKSTIRAVWKRFGSRFNL 

EFCASIPTSSLIESLTREGFCDSSGRPVGRSTWTILKDDDILNKYHQIWGDLSCYYSGSFSRDGLWRAKYILQLSCAKTLAQKHKSTTRWRNHFGLKFIT 

ELCPIIPFLLLVNSLARGGFCTNLGRPVSKLSWTTLTDDDILKKFDQIWRSVYYYYSGSINNHGLFRLRYIFRFSCAKTLACKHKSTTRIVWKRFSLNSFL 

ELCVIIPVFRLIQLLTKEKFCNTSGRPISKSAWTTFQDDDILNQFNHIWKNLFYYYSGCLNRSDLYQIQYILRFSCAKTLACKHKSTIRWWKKYGSRLFP 

ELSPQIQVISMIEFFSIEGFCDITGKPISKLSWIRFTDDSIFDRYDRSWKFLYYYYSGVINKGSLDRVKYILLFSCFKTLALKHKSTIRWRKEFDVKLFN 

EFHPKFGIISIMKFLSIEGFCDIMGRPISKLSWTCFTDDDIFDKCDRFWKILYYYYCGAKNKAYLDRIKYILLLSCFKTIAFKHKSTIRWRKEFDFELRK 

ELNSKLSAVFVIQFLSKEGLCDIMGNPKSKLAWLSFTDNSILDKYDHFCRNVDSFYSEAINKRFLDRVKDILFLSCIKTLACKHKSTIRIVRKELGFELRK 

ELNPIAPIRSILFFLAKERFCDISGQTISKLSWTSLSDDDILDRFDRICRNLFHYYSGSINPDGLYYIKYILLLPCAKTLACKHKSTIRWREESGSELFT 

ELNPIAPIRSILFFLAKEKFCDISGWPISKLSWASLSDDDILDRFDRIWINLFHYYSGSINQDGLYHIKYILLLSCAKTLACKHKSTIRWREQLGSELFT 

ELDPIAPIRSIIGLLAKERFCDISGRPICKLAWTSLSDDDILDRFDRICRNLFHYYSGSFNQDGLYSIKYILLLSCAKTLACKHKSTIRWREELGSELFT 

EMDPIVPIVPIIGLLATEKFCDISGRPISKLSWTSLTDDDILDRFDQIWRNLFHYYSGSFDRDGLYRIKYILLLSCAKTLACKHKSTIRWRKELGPELFK 

EFDPIAPTTLLIGSLAKEKFCDISGHPISRLAWTGLTDDDILDRFDRIWRNIFHYYSGSSKKDGLYRMKYILRLPCAKTLACKHKSAIRWRERFGSELFT 

EFDPIAPTTLLIGYLAKERFCDISGRPTGRLAWTGLTDDNILHRFDRIWRNILHYYSGSSKKDGLYRMKYILRLPCAKTLACKHKSAIRWRERFGSELFT 

EFDSIAPIIPLIGLLAKERFCDISGRPISKLAWTGLKDDDILDRFDRICRNIIDYYSGSFNKDGSYRMKYILRLPCAKTLACRHKSTIRWWEEFGSELFT 

RFDTIVPIFPLIGSLVKAKFCNVSGYPISKSVWADSSDSDIIARFGWICRNLSHYHSGSSKKHSLCRIKYILRLSCARTLARKHKSTVRAICKRLGSKLLE 

RFDTIVPIFPLIGSLVKAKFCNVSGHPTSKSVWADLSDSDIIARFGWICRNLSHYHSGSSKKHSLCRIKYILRLSCARTLARKHKSTVRAICKRLGSKLLE 

RFDTWPTIFLIGSLAKVKLCNVSGHPISKSVWADSSDSDILDQFGRICRNLSHYHSGSYKKHSLCRIKYILRLSCARTLARKHKSTVRAILKRLGSEFLD 

KFETLVPIIPLIGSLAKAKFCDVSG7PISKSARADSSNSDIINRFGRIYRNISHYHSGSSKKQTLYRIKYILRLSCARTLARKHKSTVRAFLKRLGSKFLE 

KFETLVPIIPLIGSVAKAKFCNVSGHPISKSVRADSSDSDIINRFGRIYRNLSHYHSGSSKKQTLYRIKYILRLSCARTLARKHKSTVRAFLKRLGSEFLE 

KFETIVPIIPLIGSLAKAKFCNGSGHPISKPFRTDLSDSEIINRFGRICKNLSHYHSGSSKKQSLYRIKFILRLSCARTLSRKHKSTVRAFLKRLGSELLE 

KFETIVPIISLIDSLSKEKFCNLSGHPTSKAIWSDLSDSDIMERFGRVCRNLSHYYSGCSKKQILYRIKYILRLSCARTLARKHKSTVRTFLKKLGSGFLK 

KFDTIVPIIPLIGSLSKAKFCNFSGHPISKPAWADSSDSDIIDRFGRICRNLSHYYSGSSKKKSLYRIKYILRLSCARTLARKHKSTVRSFLKRLGSEFLE 

KFYTIVPIITLIGSLANSKFCNASGHPISKSARTDSSDSDIIDRFGRICRNLSHYYSGSSKKKSLYRVKYILRLSCARTLSRKHKSTVRSFFKRLGSELLE 

KFDTIVPISPLIGSLA7AKFCTYQGIPISKPVRADSSDSDIIDRFGRICRNLSHYHSGSSKKKSLYRIKYILRLSCARTLASKHKSTVRAFLKRFGSELLE 

KFDIIVPIIPLIRSLAKAKFCNLVGDPISKPAWADSSDSYIIDRFVRICRNIYHYHSGSSKKNCLYRVKYILRLSCARTLARKHKSTVRAFLKRLGSGLLE 

KFDTIVRIIPLVGSLAKAKFCNVLGHPISKSVWTDLLDSDIIDRFGRICRNLSHYYSGSSRKKSLYRIKYILRLSCARTLARKHKSTVRAFLKRLGSEFLE 

7FDTRVPVISFIGSLAKAKFCTVSGHPISKPIWTDLSDCDIIDRFGRICRNLSNYLSGSSKKQSLYRIKYILRFSCARTLARKHKSMVRAFLQRLGSGLLE 

KFDTWPVILLIRSLAKAKFCTVSGHPISKPIWADLSDSDILDRFGRICRNLSHYHSGSSRKRGLYRIKYILRLSCARTLARKHKSTVRTFLRRLGSGLLE 

KFDTIVPIIPLIGSLSKAKFCNVSGHPISKPIWTDLSDSDIIDRFVRICRNLSHYHSGSSKKQSLYRMKYILRLSCARTLARKHKTTVRAFFQRLGSGFLE 


Fig. 1. Alignment of MatK X-domain comprised of 101 amino acids, for 40 taxa. Asterisk shows the putative 
RNA binding sites suggested by Mohr et al. (1993). 
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Fig. 2. Results of dot blot hybridization. A: Probed with Lycopodiella matK. B: Probed with Ophioglossum matK. 
C: Probed with Selaginella matK. D: Probed with Adiantum matK. E: Probed with Ophioglossum rbcL. F: The 
number and position of each dot. The corresponding taxa are as follows: 1. Anthoceros formosae, 2. Marchantia 
polymorpha, 3. Equisetum ramosissimum, 4. Isoetes taiwanensis, 5. Selaginella doederleinii, 6. Selaginella 
delicatula, 7. Selaginella stauntoniana, 8. Selaginella involuens, 9. Selaginella tamariscina, 10. Lycopodiella 
cemua, 11. Lycopodium pseudoclavatum, 12. Ophioglossum petiolatum, 13. Angiopteris palmiformis, 14. 
Osmunda banksiifolia, 15. Adiantum capillus-veneris, 16. Dicranopteris linearis, 17. Lygodium japonicum, 18. 
Sphenomeris biflora, 19. Nicotiana sylvestris, 20. plasmid harboring Selaginella doederleinii matK, 21. plasmid 
harboring Lycopodiella cemua matK, 22. plasmid harboring Ophioglossum petiolatum matK. 
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Fig. 3. Result of RT-PCR. A: Results from Lycopodiella cernua (denoted as "L") and Ophioglossum petiolatum 
(denoted as "O"). B: Results from Selaginella doederleinii. RNA templates were used in the RT-PCR reactions 
on the left, and genomic DNA templates were used on the right as control in both (A) and (B). M: PCR of matK 
region; R: PCR of rbcL region; rl: PCR of rbcL-atpB intron; tl: PCR of tmL intron. 

Phylogenetic analysis of matK 

The data matrix based on nucleotide sequences of matK coding regions contains 1867 
characters with 1665 variable sites, 1498 of which are parsimony-informative characters. All 
characters were included in the phylogenetic analyses. Phylogenies obtained by 
neighbor-joining and parsimony methods are similar, and the results are shown in Fig. 4. 
Eighteen equally most parsimonious trees were found with length of 10059, consistency index 
= 0.37 and retention index = 0.56. Bryophytes, ferns, and fem allies are unresolved at the base 
of embryophyte clade. The sequences are very difficult to align except for the X domain in 
these taxa as well. Gnetophytes, conifers+cycads+Ginkgo, and angiosperms form three very 
well supported clades, all receiving 100% of bootstrap values. The resolution is relatively low 
within angiosperms, except for well supported eudicots (93-95% bootstrap support), 
represented by Hepatica, Mahonia, Trochodendron, Saxifraga, and Lepismium. 
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Fig. 4. Strict consensus tree of eighteen equally most parsimonious trees based on chloroplast matK nucleotide 
sequences. Numbers above the branches are bootstrap values using parsimony (before slash) and 
neighbor-joining (after slash) criteria. Numbers below the branches are decay indices. 

DISCUSSION 

In this study, we have identified plastid trnK/matK, rbcL and nuclear 18S rDNA 
sequences from three taxa: Lycopodiella cernua, Selaginella doeclerleinii, and Ophioglossum 
petiolatum. We failed to obtain matK sequences from other ferns and fern allies using 
degenerate primers or employing different PCR conditions (data not shown). However, the 
presence of matK is confirmed by dot blot hybridization, suggesting their matK might be too 
divergent to be obtained using ordinary PCR approach. RT-PCR results show that, even if 
matK gene is present with intact open reading frame, it may not be expressed as demonstrated 
in Selaginella doederleinii. This poses the question of whether or not matK is essential for 
Group II intron splicing in lower land plants. Since the matK homologue in Anthoceros 
formosae has been suggested to be a pseudogene (Kugita et al., 2003), we speculate that 
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Fig. 5. A simplified diagram of green plant phylogeny and the distribution of introns in their chloroplast genome. 
The phylogeny is according to Pryer et al. (2001), APG II (Bremer el al., 2003), and Palmer et al. (2004). Gray 
boxes indicate genes with introns and white boxes indicate genes without intron disruption. The dotted boxes in 
Epifagus and Adiantum represent rnatK gene in free-standing form (truK exons are missing). 


chloroplast matK may indeed not be essential in primitive lineages of land plants. Nontheless, 
the presence of introns in chloroplast genomes is largely correlated with the presence of matK. 
Fig. 5 shows a simplified diagram of green plant phylogeny and the presence/absence of 
introns in their chloroplast genome. Chorella vulgaris and all green algae other than 
Characeae, show an intact trnK without intron interruption. Characeae and all of the land 
plants, in comparison, show the trnK 5 -matK-tmK y structure. The figure clearly shows that the 
presence of introns in many genes coincides with the presence of matK residing in trnK. 
Although there are several intron gains and losses along the lineages of land plants, most 
introns persist throughout evolutionary history. 

Given the fact that these Group II introns are capable of self-splicing, matK probably 
played a minor role when chloroplasts first harbored introns in their genomes. We speculate 
that, as evolution proceeded, matK gained some more important function in Group II intron 
splicing, thus becoming indispensible. 

The sequences of matK are very divergent even in the usually highly conserved X domain, 
especially among Charophytes, ferns and fern allies. However, the serine (S) residue in the X 
domain, marked in Fig. 1, is highly conserved among all sequences, consistent with the 
universal presence among Group II introns (Mohr et al., 1993). Only four taxa show a 
replacement of proline at this position: Amentotaxus, Zamia, Cycas, and Ginkgo. Whether or 
not this has any functional correlation, or evolutionary meaning, is subject to further tests. 
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Phylogenetic analysis using matK sequences poses certain difficulties since they are likely 
too divergent to provide confident relationships. Although the relationships among ferns and 
fern allies are unresolved, the peculiar placement of angiosperms, gnetophytes, and the rest of 
seed plants is quite interesting. The position of Gnetales is one of the most enigmatic 
questions in seed plant phylogeny (see review by Burleigh and Mathews, 2004). They have 
been placed with various groups of seed plants, but most recent molecular studies support that 
Gnetales are related to conifers (Chaw et al., 2000; Rydin et al., 2002; Soltis et al., 2002). The 
placement of Gnetales sister to the rest of the seed plants ("Gnetales-sister" tree), has only 
been found in a few studies (Hamby and Zimmer, 1992; Sanderson et al., 2000; Rydin et al., 
2002). Although the internal support of Gnetales-sister tree is quite high in the matK tree (Fig. 
4), we speculate that this might be due to methodological problems imposed by the high 
evolutionary rate in the matK phylogeny, as demonstrated by Burleigh and Mathews (2004). 
Phylogenetic analysis using nucleotide sequences of the X domain region only showed that all 
gymnospenns formed a monophyletic group using parsimony criteria (data not shown). 
However, relationships within angiosperms are peculiar in that some eudicots becoming the 
basal group of angiosperms in this analysis using X-domain data set, although the support of 
this tree topology is low (<50% bootstrap value). This suggests that it may not be easy to 
extract correct phylogenetic information from matK genes, at least at the higher taxonomic 
level, and a thorough analysis is needed to elucidate this problem. 

It is clear that matK is quite divergent in ferns and fern allies, and the failure to amplify 
the matK region using ordinary PCR indicates that we have to use other approaches in order 
to obtain these sequences. Such data are much needed especially in true ferns, where there is 
only one very divergent sequence available ( Adiantum capillus-veneris). 
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□ □□□□□□□□□□ matK □ □ □ □ □ □ matK □ 0 □ □ □ 

□ □ □ (1) D □ □ □ (U) 

(□□□□□ 2004 □ 10 □ 27 □□□□□□□ 2004 Q 11 □ 24 □ ) 


□ □ 

Q Q Q Q Q G Q trnK^U □□□□□□□□□□□□□□□□□□□□□□□□ 

□ □□□□□□□□□□□□□□ maturase □□□□□□□ matKU matK □□□□□□ 

□ □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ tmK D □ G G 

□ □ □ □ tmK 5 -matK-trnK 3 □□□□□□□□□□□□ tmK G □ □ matK □ G D D D D 
QQQQQGQQQQQ tmK 5 - matK-trnK 5 GGG0GG000G000000G00 
G G G matK G Q tmK G0000G0G0G000G0000G00 Epifagus 
virginiana G00000000000 tmK GQQQQQQQQQQQQ matK G G Q 

□ □□□□□□□□□□ matK a aaaaaaaaaaGQaGGaaaGaaaa 

G tmK 5 -matK-tmK 5 GQQQQQQQQQQQQQ matK GQQQQQQQQQQQ 

□ □GGGGGaaGGaaGGaGGG matK g aGGaaaGGaaaaaaa 

QQQQQQQQQ tmK 5 - matK-trnK 5 G Q Q Q G RT-PCR G G matK G Q Q Q Q Q Q 
aaaaaaaaaaGaaaaaaaaaaaaa matK □ aaaaaaaaaa 
GaGaaaGGGa pcRaaaaaGaaaaaGaaaaaaaaaGaaa 

QQGQQQQQG tmK G matK GGQQGQQQGQGQGQGQGQQGQQG 
g matK □ aaGaaaaGGaaaaaaaaaaaaG pcrqG aaaaaaa 
G G Q G matK GQGQGQGQQQGQQQGQQQ matK G00GG000G0 
aGaaaaaaaaaG(aaaaaaaa)aaaaaaGaaaaaaaaaa 
GaaaGaaaGaaaaaaaaaaaGa matK g aaaaaaaaaaaa 
GOGaGOGaGG 

a a a a g a a mam aaaaGaaaaaaaaaaaa 


!.□□□□□□□□□□□□□□□□□□□□□□ 106 □ □ □ □ 4 □ 1 □ □ □ □ □ 
2- D D D D D Email: jmhu@ntu.edu.tw 



